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For the nonparametric estimation of multivariate finite mixture models with the conditional 
independence assumption, we propose a new formulation of the objective function in terms 
of penalized smoothed Kullback-Leibler distance. The nonlinearly smoothed majorization- 
minimization (NSMM) algorithm is derived from this perspective. An elegant representation 
of the NSMM algorithm is obtained using a novel projection-multiplication operator, a more 
precise monotonicity property of the algorithm is discovered, and the existence of a solution 
to the main optimization problem is proved for the first time. 
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Introduction 


In recent years, several studies have advanced the development of estimation algorithms, 
based on expectation-maximization (EM) and its generalization called majorization- 
minimization (MM), for nonparametric estimation for conditional independence mul¬ 
tivariate finite mixture models. The idea for these algorithms had its genesis in the 


stochastic EM algorithm of Bordes et ah (2007) and was later extended to a determinis 


tic algorithm by Benaglia et ah (2009) and Benaglia et al. (2011). These algorithms were 


placed on a more stable theoretical foundation due to the ascent property established by 
Levine et al. | (2011). A detailed account of these algorithms, along with the related theory 


of parameter identifiability, is presented in the survey article by Chauveau et al. (2015). 


This paper follows up on this line of research, extending the theoretical foundations of 
this method and deriving novel results while also simplifying their formulation. 

Conditional independence multivariate finite mixture models have fundamental impor¬ 
tance in both statistical theory and applications; for example, as Chauveau et al. ( 2015|) 
point out, these models are related to the random-effects models of [Laird and Warq 
(1982). The basic setup assumes that r-dimensional vectors X* = (Aj^i, Xi^ 2 , ■■■, Xi^r) ' , 
1 < i < n, are simple random samples from a finite mixture of m > 1 components 
with positive mixing proportions Ai,A 2 ,...,Am that sum to 1, and density functions 
/i,/ 2 ,...,/m- Here, we assume m is known. For recent work that addresses the esti- 
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mation of m, along with a different approach to the estimation of the model parameters 
than the one outlined here, see Bonhomme et al. (2014) and Kasahara and Shimotsu 

( [Mil ). 

The conditional independence assumption, which arises naturally in analysis of data 
with repeated measurements, says each fj, 1 < j < m, is equivalent to the product of its 
marginal densities Thus, the mixture density is 




( 1 ) 


1=1 


1=1 k=l 


for any x = (xi, ...Xr)"'" G This is often regarded as a semi-parametric model with 
Al,..., Am being the Euclidean parameters and fj^k, l<j<rn, l<k<r being the 
functional parameters. Let 9 denote all of these parameters. 

The identifiability of the parameters in the model ([^ was not clear until the break¬ 
through in Hall and Zhou (2003) which established the identifiability when m = 2 and 
r > 3. Some follow-up work appeared, for example, [Hall et al. (2005) and Kasahara and 
Shimotsu (2009), until the fundamental result that established generic identifiability of 


(1) for r > 3 was obtained (Allman et al. 2009) based on an algebraic result of Kruskal 


(1976, 1977). 


Bordes et al. (2007) proposed a stochastic nonparametric EM algorithm (npEM) esti¬ 


mation algorithm for the estimation of semiparametric mixture models. Benaglia et al. 


(2009) and Benaglia et al. (2011) proposed a deterministic version of the algorithm for 
the estimation of ([^ and studied bandwidth slection related to it. However, all these 
algorithms lack an objective function as well as the descent property which chracterizes 
any traditional EM algorithm (Dempster et al. 1977). A significant improvement comes 


from Levine et al. (2011), which proposes a smoothed likelihood as the objective func¬ 


tion and leads to a smoothed version of the npEM that does possess the desired descent 
property. The authors point out the similarities between their approach and the one in 
Eggermont] (1999) for non-mixtures. However, the constraints imposed by the condition 
that each fjk must integrate to one lead to tricky optimization issues and necessitate 
a slightly awkward normalization step to satisfy these constraints. In reformulating the 
parameter space, the current paper removes the constraints and provides a rigorous jus¬ 
tification for the algorithm, proving the existence of a solution to the main optimization 
problem for the first time. In addition, this paper sharpens the descent property by de¬ 
riving a positive lower bound on the size of the decrease in the objective function at each 
iteration. 


2. Reframing the Estimation Problem 

In the following, we first consider an ideal setting where the target density is known (i.e., 
the sample size is infinity). Then we replace the target density by its empirical version 
and obtain the discrete algorithm. 


2.1. Setup and Notation 

Let X = (xi, X 2 , • • • , Xr)~^ G M'" and let g denote a target density on M^, with support in 
the interior of D, where D is a compact and convex set in M^. Without loss of generality, 
assume D is the closed r-dimensional cube [a, 6]'". We are interested in the case when g 
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is a finite mixture of products of fully unspecified univariate measures, with unknown 
mixing parameters. 

We make the following assumptions: 

(i) Let the number of mixing components in g be fixed and denoted by m. There 
exist non-negative functions ej(x), 1 < j < m, such that 

m 

= ( 2 ) 

i=i 

(ii) For each 1 < j < m, 


ej(x) = Oj ej^kixk), (3) 

k=l 

where 6j > 0 and for each k, 1 < k < r, ej^k £ L^(M) is positive with support in 
[a, 6], Hence each ej(x) is in L^{W’), positive, and with support in H. 

Given a bandwidth /i G M, let Sh{-, •) G x M) be nonnegative and with support in 
[a,b] X [a,b], such that 

(hi) For u, z G M, 


J Sh{v,u) dtt = J Shiu,z)du = 1. 


(4) 


(iv) There exist positive numbers Mi{h) and M 2 such that for any v,z € [a, b], 

Mi{h) < Sh{v,z) < M 2 . (5) 

(v) The function .Sh has continuous first-order partial derivatives on (a, b) x (a, b) and 
there exists a constant B such that for any u,x G (a, 6), 


9 



d 

z)\^=u\ 

< B 

and 

■^Sh{v,Z)\z=^ 


< B. 


(vi) If we define /j(x) = ej(x)/ f ej{z)dz, then 

/,(x) > (Ml(/r))^ 

for all X G H and for each j G {1, 2, m}. 


( 6 ) 


(7) 


Before stating the optimization problem, we define the smoothing operators Sh, 
and J\fh, as follows. 

For any / G L^{W), let 

(5'ft/)(x) = y s,,(x,u)/(u)du and {S^f ){x) = J Sh{u,x)f{u) du, (8) 


3 
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where 


Sft(x,u) = for x, u e 


k=l 


Furthermore, let 


{Mhf){^) = 


exp[(S’;^log/)(x)] for X e 
0 elsewhere. 


(9) 


( 10 ) 


These smoothing operators are well-known and have many desirable properties (Egger- 
mont|19^ ). For instance, Lemma 1.1 of Eggermont (1999) states that for any nonnegative 
functions gi and §2 in L^{W), 


KL{Shgi,Shg2)<KL{gug2), 

where KL is the Kullback-Leibler divergence defined by 


KL ( 51 , 92 ) = I 


1 Si , 

51 log-h 52 - 51 

92 


( 11 ) 


( 12 ) 


2.2. Main Optimization Problem 

Now, we assume conditions (i) through (vi) and propose to estimate e by minimizing the 
function 



subject to these conditions. In fact the only assumptions that impose any constraints on 
e are (ii) and (vi). Minimization of l{e) can be written equivalently as minimization of 
the penalized smoothed Kullback-Leibler divergence 


KL j g,'^{Mhej) j 




E- 

1=1 


1=1 


(x) dx. 


(14) 


where in (14) the second term acts like a roughness penalty. 


The discrete version of the optimization problem replaces 5 (x) dx by dG„(x), where 
Gn is the empirical distribution function of a random sample of size n, and in this case 
we minimize 


^ It lit n lit 

^discrete(6) — ^ ^ J -^/t^l)(^l) T / ^ y ^1 (^)' (1^) 

^ i=l 1=1 1=1 

Although we do not constrain e to require that the sum of all e* is a density as required 
by Equation ([^, this property is guaranteed by the main optimization: 
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Theorem 2.1 


Any solution e to (13) or (15) satisfies 



(16) 


Proof. For any fixed e, differentiation shows that the function l{ae) is minimized at the 
unique value 


a = 1 



(17) 


Thus if e is a minimizer then Equation (16) must hold. 


From (16), we see that for each 1 < j < m, f ej can be interpreted as the mixing 
weight corresponding to the jth mixture component. 


3. The NSMM Algorithm 


In this section, we derive an iterative algorithm, using majorization-minimization (Hunter 


and Lange 2004), to minimize Equation (13). The algorithm, which we refer to as the 


nonlinearly smoothed majorization-minorization (NSMM) algorithm, coincides with that 
of Levine et al. (2011), despite the different derivation. 


3.1. An MM Algorithm 

Given the current estimate satisfying assumptions (ii) and (vi), let us define 



(A/'feef^)(x) 

E (A4e(?))(x) 
i'=i 


(18) 


for 1 < j < m, noting that ~ concavity of the logarithm function 

gives 

/(e)-/(e(0)) 


5(x) log^ 


(A4ef^)(x) (A4ej)(x) 


E )(x) 

i'=i 




vi=i i=i 


( 0 ) 


< - 


/ m 

i=l 


(A4e®)(x 


• log 


(A/'feej)(x) 

E (AA,.e®)(x) ” (A 4 ef )(x) 

j'=i 


d-W - (If) 


vi=i 1=1 
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So if we let 





wf\^) • log (A4ej)(x) dx + 


/ 



we obtain 


/(e) 


( 20 ) 


( 21 ) 


Using the MM algorithm terminology of [Hunter and Lange (2004), Inequality ( |21| ) 
means that may be said to majorize I at , up to an additive constant. Minimizing 
therefore yields a function satisfying 


/(e(i)) < /(e(°)). 


( 22 ) 


Thus, we now consider how to minimize ^(^^(e), subject to the assumptions on e that 
were stated at the beginning. This is to be done component-wise. That is, for each j, we 
wish to minimize 



= -J g{x)wf\x) ■ J Sft(u,x) logej(u) dudx-F J ej 


g{x)wf\x) ■ Sfe(u,x) 


^logej-fc(ufc) -h log^ 
fc=i 

+/ n ej,fc(ufc)du. 


du dx 


k=l 


(23) 


Up to an additive term that does not involve any Expression (23) is 

JJ 9i^)'wf\^) ■ Sh{uk, Xk) \ogej^k{uk) dufc + J H ^j,kiuk) du. (24) 


For any k in 1,... ,r, we can view Expression ( |24| ) as an integral with respect to duk- 
Differentiating the integrand with respect to Cj^kiuk) and equating the result to zero, 
Fubini’s Theorem gives 


ej,kiuk) oc / g{x)wf\x.) ■ Sh{uk,Xk) dx 


(25) 


This tells us, according to (j^, that 

r p 

ej{u) = ajW / g{yi)wf\yi) ■ Sh{uk,Xk) dx 
_1 ^ 


(26) 


k=l • 


for some constant a,-. To find a,-, we plug (26) into (23) and differentiate with respect 


6 














October 29, 2015 


Journal of Nonparametric Statistics TechNsmm'jns 


to aj, which gives as a final result 


n /<)( ^)wf\x.) • Sh{uk,Xk) dx 


ei(u) = 


k=l 


f g(x)vjf\x} dx 


-| r—1 


( 27 ) 


To summarize, our NSMM algorithm starts with some initial estimate satisfying 
assumptions (ii) and (vi), then iterates according to 


e(P+^)(u) 


= G(e‘»>)(u), 


(28) 


where G(-} performs the one-step update of Equation (27). In practical terms, NSMM 


is identical to the non-parametric maximum smoothed likelihood algorithm proposed in 


Levine et al. (2011). However, our derivation uses a simpler parameter space and the 


normalization involved in each step of the algorithm is now a result of optimization. We 
have thus rigorously derived the NSMM algorithm as a special case of the majorization- 
minimization method. 

In the discrete case, we replace the density g(-) by the empirical distribution defined 
by the sample; thus, the algorithm iterates according to the following until convergence, 
assuming is the current step estimate: 

Majorization Step: For 1 < i < n, 1 < j < m, compute 


w 


(p) 


(Xi) = 




Ei^fh 

i=i 


ip)\ 


(29) 


er)(xi) 


Minimization Step: Let 


p(p+L 


u 


k=li=l 


E 

n 1 ' 


T=1 


r—1 


(30) 


3.2. The Projection-Multiplication Operator 

The NSMM algorithm of Section |3.1| can be summarized in an elegant way using the 
projection-multiplication operator, defined as follows. For any nonnegative function / 
on such that f f > 0, and x = (xi,X2,-'' ,XrE ^ the operator P, which 

factorizes / as a product of marginal functions on M’’, be defined by 


(^’/)(x) = 


n / /( x) dxi dx2 ■ ■ ■ dxk-i dxfc+i • • • dxr 

ik=lR^-^ 


[//] 


(r-1) 


(31) 


When / is a density on M'", the right side of ( |31[ ) simplifies because the denominator is 
1. As the next lemma points out, the P operator commutes with the Sh Operator. 
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Lemma 3.1 Assume f is an integrahle nonnegative function on M'’ with support in a 
compact set ft. We have 


{P o Sh)f = {Sh o P)f. 


(32) 


Proof. See Appendix ( A.l[). 


Lemma 3.1 implies that G(-), which performs the one-step update of the NSMM algo¬ 


rithm, can be expressed concisely as 




(u) = PoShig-w'f^) (u) 


(33) 


for 1 < j < m. In the discrete or finite-sample case, g{-) places weight 1/n at each 


sampled point. Equation (33) therefore suggests a geometric intuition of G(-) in the 


discrete case, which is illustrated in Figure 



wts 




Figure 1. Illustration of the G(-) operator for a finite (n = 16) sample in the case r = 2: The operator first 
smoothes the weighted dataset and then applies the P operator to it, yielding the product of the smoothed 
marginals, shown here in red, as the density estimator at the next iteration. 


3.3. Sharpened Monotonicity 


For any MM algorithm, including any EM algorithm (Dempster et al. 1977), the well- 
known monotonicity property of Inequality (22) says that the value of the objective 
function moves, at each iteration, toward the direction of being optimized (Hunter and 


Lange 

2004 

et al. 

2011) 


For the NSMM algorithm, this descent property was first proved in Levine 
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Proposition 3.2 In the continuous (infinite-sample) version of the NSMM algorithm, 
at any step p, we have 


l{e^^'^) — KL{ef^ > KL{g ■ ,g ■ ). 

i=i i=i 


Proof. See Appendix (A.2). 


Remark 1 The discrete version of Proposition 3.2 


IS 


= ^KL{e 

i=i 


(p+i) (p)' 

j ’ i • 




i=l j=l 


W- 


Xi 


Proposition 3.2 implies the following corollary: 

Corollary 3.3 In the NSMM algorithm, at any step p, we have 


(34) 


(35) 


/(e(P)) - /(e(P+i)) >J2KL{ef^^\ef). 

j=i 


(36) 


Inequality (36) may be established directly, using Jensen’s Inequality, and we include 
this proof separately as an appendix because it is interesting in its own right. 


Proof. Direct proof of Corollary 3.3 can be found in Appendix (A.3). 


Corollary |3.3| implies the following two novel results. First, Corollary 3.4 guarantees 
that we only need to search among fixed point (s) of the NSMM algorithm for a solution to 
the minimization problem. This gives a theoretical basis for using the NSMM algorithm 
for this estimation problem. 

Corollary 3.4 Any minimizer e of fie) or Idiscretei^) is a fixed point of the corre¬ 
sponding NSMM algorithm. 


Proof. Since the right side of (36) is strictly positive when for any j, a 

necessary condition for to minimize fie) is that = e^^fi i.e., that is a fixed 

point of the algorithm. The same reasoning works for /discrete (o)- * 


Second, Corollary 3.5 ensures among other things that the distance between esti¬ 
mates of adjacent steps from an NSMM sequence will tend to zero, a result that is used 
in the next section. 


Corollary 3.5 In the NSMM algorithm, at any step p, we have 


m - 

fie(p))-fieiP+^))>Y^- 




(p+i) _ (p) 
^3 D 


(37) 


where 


denotes the norm. 
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Proof. The result follows from Inequality (3.21) in |Eggermont and LaRic^ ( 2001| , which 
states that 


KL{gi,g2) > ^||£/i,52||? 


(38) 


for functions gi and g 2 . 


4. Existence of a Solution to the Maximization Problem 


In this section, we verify the existence of at least one solution to the main optimization 
problem of Section 2.2 a novel result as far as we are aware. 


Lemma 4.1 Given e satisfying assumption (ii), we have l{e) >1. In the discrete case, 
we have ldiscrete{e) > - logM 2 . 


Proof. See Appendix (A.4). 


Together, Lemma 3.3 and Lemma |4 .1 1 imply the following corollary. 


Corollary 4.2 In the NSMM algorithm, will tend to a finite limit as p goes to 

infinity. This result also holds in the discrete case. 


We now establish some technical results that lead to the main conclusion of this section, 
namely, the existence of a minimizer of both l{e) and /discrete (s)- 

Lemma 4.3 Assume conditions (i) through (vi). For each j, 1 < j < m, any NSMM 
sequence {ej^^}i<p<oo is uniformly bounded and equicontinuous on It. This result also 
holds in the discrete case. 


Proof. See Appendix (A.5). 


More generally. Lemma 4.3 implies the following result: 


Lemma 4.4 For e satisfying assumptions (i) through (vi), in either the discrete or the 
continuous case, for 1 < j <m and u, v G 12, we have 


(G'(e)),(u)<M2^ (39) 

|(G(e)),(n) - (G(e)),-(v)| < [B • M^'] • ||u - v||i. (40) 

The following lemma establishes a sort of lower semi-continuity of the functional /(•), 
which will be needed in proving existence of at least one solution to the main optimization 
problem. 

Lemma 4.5 Let G L^{W) be nonnegative and with support in Q for each p and j, 
where 0 < p < 00 and 1 < j < m. Assume each 7 ^^^ uniformly converges to 7^°°^ in 
L^{W) and that all 'y^'^ are bounded from above by a constant Q > 1. Let 7 ^^/ and 71 °°) 
represent ( 7 ^^^ • • • , 7 m^) and ( 7 ^°°^ • • • , 7 ^^), respectively. Then we have 

/(7(“)) < liminf/(7(P)). (41) 

p^oo 
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This is also true for the discrete case. 


Proof. See Appendix (A.6). ■ 

Theorem 4.6 Under assumptions (i) through (vi), there exists at least one solution to 
the main optimization problem (13). This is also true in the discrete case. 


Proof. See Appendix (A.7). ■ 

To conclude this section, we discuss the rationale behind assumption (vi) and related 
issues such as why the J\fh operator is well-defined as we applied it. 

Lemma 4.7 In an NSMM sequence {e(^’)}o<p<oo; the are all strictly positive for all 
j. Moreover, if we let = J and L^(u) = e^^\u)/X^J'\ then 


{M^{h)Y < /j^)(u) < 

for all u € Q, p > 0, and 1 < j < m. 


(42) 


Proof. See Appendix (A.8). 


Lemma (4.7) shows why in assumption (vi) we require the marginal densities of each 
mixture component to be bounded below by (Mi(/i))^ and guarantees that dividing by 
zero never occurs in any NSMM sequence. 


5. Discussion 


Starting from the conditional independence finite multivariate mixture model as set forth 


in the work of Benaglia et al. (2009) and Levine et al. (2011), this manuscript proposes 


an equivalent but simplified parameterization. This reformulation leads to a novel and 
mathematically coherent version of the penalized Kullback-Leibler divergence as the main 
optimization criterion for the estimation of the parameters. 

In this new framework, certain constraints that were previously imposed on the pa¬ 
rameter space may be eliminated, and the solutions obtained may be shown to fol¬ 
low these constraints naturally. These contributions help to rigorously justify the non¬ 
parametric maximum smoothed likelihood (npMSL) estimation algorithm established by 
Levine et al. (2011). 


As part of our investigation, we have discovered several new results, including a sharper 
monotonicity property of the NSMM algorithm that could ultimately contribute to fu¬ 
ture investigations of the true convergence rate or other asymptotic properties of the 
algorithm. We also prove, for the first time, the existence of at least one solution for the 
estimation problem of this model. 

Because of the elegant simplicity and mathematical tractability associated with this 
framework, we believe the results herein will serve as the basis for future research on this 
useful nonparametric model. 
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Appendix A. Mathematical Proofs 


A.l. Proof of Lemma 3.1 


Proof. Since {P o Sh) is linear, we only need to consider the case where / is a density 
function. By Fubini’s Theorem and Equation (31), 

[{PoSh)f] (x) 


n 

k= 


[ / Sh{'x,u)f{u)du\ dxidx2-■-dxk-idxk+i-■-dxr 

Vr'- / 

Y\_ J i J S/i(x,u)/(u)dxidX2---dXA:-ldXfc+i---dXr I du 
n J Sh{xk,Uk) I J f{u)duidu 2 ---duk-iduk+i---dur\ duk 


k=l 


R- 


'll / /(u) dtti du 2 • • • dufc-i dufe+i • • • dur j du 


^fc=l 


k=l 


= [{ShoP)f] (x). 


A.2. Proof of Propositon \3.^ 

Proof. Direct evaluation and the definition of Kullback-Leibler divergence in Equa¬ 
tion (12) give 

l{e^P'>) - l{e^P+^^) 

-dx = ^ / g'(x)t(;j^^(x) log -dx 


= / 5(x)log 


E(V.e?>)(x) j.l 

d=l 


EWie^Xx) 

d=l 


I s{^)wf \^)^og 




1=1 


(A4e^^^)(x) 


dx 


/ 9(x)u;}^Tx)log 
1=1-^ 


(AA.ef+^))(x) 

(A4e[^^)(x) 


dx + ^ / 5r(x)n;j^^(x)log 


1=1 


w‘f\x) 

(x) 


dx 




(p+i) (p)^ 

1 ’ 1 ^ 


1=1 


+ E 

1 = 1 ' 


. ^ (P). M 5(x)u;j^^(x) ( ^ (P), ^ 

g{^)wj’ (x) log--+ g{^)wj (x) - g{^)wf ’ (x) 

5(x)u;.^ ^(x) 


dx 
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m m 

= + '^KL{g ■ wf\g- 

j=i i=i 


(Al) 


A.3. Direct Proof of Corollary (3.3) 

Proof. If we define Xj = f ej(x)dx and fj(x) = ej(x)/Xj, then Jensen’s inequality 
together with some simplification give 




= / 9 (x)log 




tn i . 

E (M.e r')(x) 

i'=i 


dx= / 9(x)log2^ ^-- dx 


E {Mjp)(x) {^hef){x) 

i'=i 


> 


■ ™ (VftefXx) (V4''+‘>)(x) 

2^ -log--dx 


^ E {J^hef){x) “ {^hef){x) 


j>=i 


^ f gix)wf\x)log 
j=l-^ 


k=i 


p(p+i) / 


>~T n 

k=i 


pip). 


dx 


g{x)wf\x 


log 


xip) 


m „ 

m \ (p+i) m r 

= E^r'>°sEr+EE 


A 


(p+i) 


+ ^/' s/i(^^fc,a;fc)log — 

k=l J 




/1?K) 


dur 


J=l 

m 


j=l k=l 



dx 




g{x)wf\x)sh{uk,Xk)dx]log 

^ f^%k) 


duk 


m \(p+^) m r p 

E r ” los V+E E / Ar ■'/,'?■ Ao log 

J = 1 j = l k=l ^ 

m \(p+i) m p / r \ 

EAr’io^V+E/Ar’ n/r’K) 

i=i E i=i \fc=i / 


duk 


fjfiUk) 


log -du 

ip). 




k=l 


, /. \ aI'+i' n/‘r’(“‘) 

E / Ar" n /ir”(«^) log —ee -du 


\k=l 


a“ n /SAu 


A i'/JA 


Jp+i)/ 


Ap+1)/ 


ej ^(u)log 


'A '<“Eu 


m 

E 

j=i' 

^ ) du 




i=i' 


efV) 
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m 


= 2A'L(e5-«' 





A.4. 


Proof of Lemma 


4.1 


Proof. For each j, 1 < j < m, and x G Jensen’s Inequality gives 


(A4ej) (x) = exp |y Sh{u, x) log ej{u) du 
< j s/i(u,x)exp [logej(u)]du 
= / s/i(u,x)ej(u) du. 


Integrate both sides with respect to x, then use Fubini’s Theorem to obtain 


(A/hCj) (x) dx < y ej(u) du. 


(A2) 


(A3) 


Summing over j, we get 



Therefore, all three terms on the right hand side of 


(A4) 


/(e) = KL 


' m \ 

5,5^(A4ej) j 




(A5) 


are nonnegative and the middle term is 1, which implies that /(•) is always bounded 
below by 1. 

For discrete case, Jensen’s Inequality gives 

^discrete (s) — ^ ^ ^ (■A'^Cj ) (xj) + / ^ ^ (xj) 

^ *=1 i=l J=1 

.. n m 

- --Zl^osXl(A4ei)(xi) 

i=l j=l 

n m 1 

> ^log^(5;(ej)(xj) > ^logM2 = -logM2. (A6) 
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A.5. Proof of Lemma \4.3 

Proof. In the continuous case, for p > 1 and u G 


(p) ( \ k=l 
ej’{u) = - 


n / 9 {'x.)wf ^^(x) • Sh{uk,Xk) dx 


/ g{:x)w‘f ^^(x) dx 


r—1 


<M^ 


J g{:x.)w^ dx 


<Ml 


Thus {ej-^^}i<p<oo is uniformly bounded. Also, for any u in the interior of fl, 


^ {p)( \ 


n/9( x)wf ^\x)sh{uk,Xk) dx 
k^i 


■ ! g{^)wf ^\^)-i;-Sh{uuXl) dx 


/ g{:x)w^^ dx 


-| r—1 


< B- m: 


<b-m: 


r—1 

r—1 


g{x.)w^f ^\x) dx 


(A7) 


By the Dominated Convergence Theorem, the above differentiation under the integral 
is allowed because the term \g{-x)w^^ ^\'x.)^^Shiui,xi)\ is uniformly bounded by the 
integrable function B ■ g(x). 

Now by the Mean Value Theorem for functions of several variables, for any u, v G D, 
there is some d G (0,1) such that 


e^^^(u) — e^f\v) = Ve^^^[(l — d)v + du] • (u — v). 


(A 8 ) 


So 


eS^)(u) - ef\^r) < [B • M^-^] ■ ||u - v||i, 


(A9) 


which shows that {e^-^^}i<p<oo is equicontinuous on D in the norm. 

This proof can be readily adapted to the discrete case by replacing the integrals by 
summations. ■ 


A. 6. Proof of Lemma f.S 


Proof. We first consider the continuous case. In the following, Fatou’s Lemma will be 
applied twice to get the desired result. First, we show by Jensen’s Inequality that all 
are bounded from above by Q: 


(p) 


(x) = exp I y Sh{u, x) log 7 ]^^ (u) du 
<expl / s;,(u,x)log( 5 du I = Q. 


(AlO) 
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Now, for any fixed value of x, the nonnegative measurable function s/i(-,x)[(5 — 
logyj^^C')] converges to s/i(-,x)[Q — log 7 j°°^(-)] pointwise in L^{W). These functions 
are allowed to attain the value + 00 . By Fatou’s Lemma, we have 

Immf j s;,(u,x)[Q-log 7 ]^^(u)]du> j Sh{u,x)[Q -log 'y^°°\u)]du. (All) 

Exponentiating, this implies 


lim sup exp 

p^oo 


Shin, x) log 7 


(p). 


u 


du^ < 


S exp 


s/j(u,x) log 7 ^~V) du 


(A12) 


That is, 



(A13) 


which implies that 


lim inf gix) log 

p^oo 



> g{x )log 



Since alog(a/ 6 ) + 5 — o is nonnegative for all o, 6 > 0, we have 


g{x) log 



i=i 


Thus, we can rewrite (A14) as 


(A14) 


(A15) 


lim inf 

p—^oo 



>c/(x)log^—-—- +m-Q, (A16) 

E (X) 


so that both sides are nonnegative. 

Now apply Fatou’s Lemma again to obtain 


lim inf 

p^oo 


gix) log 


9(x) 


E («.7f) (X) 


+ m- Q 


3 = 


dx 


> 


lim inf g'(x) log 

p^oo 


5(x) 


E (v»7f) (x) 


+ m- Q 


dx 
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> 


g{x) log ■ 


5(x) 


E 

i=i 


(oo) 


+ m- Q 


dx. 


(A17) 


We conclude that 


liminf / 5 r(x) log 

p^oo J 


ff(x) 


E (x) 


dx > / g{x) log 


ff(x) 


E (^*7r>) (X) 


dx. (A18) 


The uniform convergence of 7 ^^^ to 7 ^°°^ for each j, together with (A18), imply 


lim inf 

p^oo 



(A19) 


That is, 


liminf/(7(P)) > 

p^oo 


(A20) 


which establishes the desired lower semi-continuity. 

The proof can be adpated to the discrete case by replacing the integrals with summa¬ 
tions. ■ 


A. 7. Proof of Theorem \4 . 6| 


Proof. By Lemma 4.1 
stant. So there exists a 
that 


r := inf{/(e)|e satisfies assumptions (ii) and (vi)} is a finite con- 
■ sequence {■ 0 ^^^}o<p<oo satisfying assumptions (ii) and (vi) such 


lim = T. (A21) 

p^oo 


By Lemma 


4.4 


is bounded and 


for each j, 1 < j < m, the sequence j}o<p<c 

equicontinuous. 

By the Arzela-Ascoli theorem, we know that {{G{ip^^))j}o<p<co has a uniformly con¬ 
vergent subsequence. Applying this theorem m times to {(G('0^^^))}o<p<oo we can ex¬ 
tract a subsequence that converges uniformly in every component. This subsequence also 
satisfies (ii) and (vi). 

That is, there exists a sequence {(G('0^^''^))}o<fc<oo, such that, for each j, 1 < j < m, 
0 <k<oo converges uniformly to a limit function in L^(M^). Denote this limit 
function by ijjj. As usual, let r/? denote the m-tuples ( 1 /^ 1 , • • • ,'ipm)- If all components of 
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-0 are nonzero, then 0 satisfies (iii). If not, we can split up some nonzero components of 
0 so that all components become nonzero, which does not change the value of /(0). In 
a word, we can assume that 0 satisfies (vi). 

Now, by Lemma 4.5 and the fact that G does not increase the value of I (see the proof 
of Lemma 3.3), we have 


T < ^(0) < lim < lim = lim = r, (A22) 

k^oo k^oo p^oo 


so that /(0) = r. Apply the operator G to 0. By Lemma 3.3 and the fact that /(0) has 
already attained the infimum value in this setting, we have 


0 > m-l[Gm > J]iLL((G(0)0,00 > 0. 

i=i 


(A23) 


So for each j, 1 < j < m, G{tp)j = 'ifjj in L^{W). Thus in particular, by (33), 0 also 
satisfies assumption (ii). We have proved the existence of a solution, 0, to the main 
optimization problem ( |13[ ). 

As above, the proof can readily be adapted to the discrete case. ■ 


A.8. Proof of Lemma \4.'l/\ 

Proof. First, by assumption (vi), each e® is strictly positive on Q. So given any x G fl. 


Thus, 


which implies 



J g{x)w^^\x) dx > 0. 


Now, we use induction. Assume 


g{x)w 


(p-i), 


x) dx > 0. 


(A24) 


(A25) 


(A26) 


(A27) 


We have 



n f ^^(x) ■Sh{uk,Xk) dx 

k=i 


f ^^(x) dx 


<M0 


(A28) 
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Similarly, 


r(p)/■ \ k = l 

friu) = — 


\\! 9 { yL)w^^ ^^(x) • Sh{uk,Xk) dx 


> iMiih)Y. 


Therefore, 


/ g{x.)w^^ dx 


(x) = J g{x)w‘f ^^(x) dx • exp (x) 

> [ g{^)wf~^\^) dx - {Mi{h)Y 


We conclude that 


which gives 


> 0. 


w^^\x.) = 


(p) 


X 


E ) (X) 


5(x)t(;j^^(x) dx > 0. 


> 0 , 


(A29) 


(A30) 


(A31) 


(A32) 


The next step of the induction follows in the same way, and the result is established. 
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